Explore the market of public catering establishments in Moscow and find interesting features that will help in choosing a suitable place to open a new establishment.
Investors of the Shut Up and Take My Money fund decided to open a catering establishment in Moscow. The location, menu, prices and type of establishment have not yet been determined.
The task is to prepare a study of the Moscow market, find interesting features and present the results obtained, which willl help investors to choosie a suitable place. We have a dataset with public catering establishments in Moscow, compiled on the basis of data from Yandex Maps and Yandex Business services for the summer of 2022.
File moscow_places.csv:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
from plotly import graph_objects as go
import folium
from folium import Map, Choropleth, Marker
from folium.plugins import MarkerCluster
from folium.features import CustomIcon
from numpy import median
import re
import os
import json
pth1 = 'moscow_places.csv'
pth2 = 'datasets/moscow_places.csv'
if os.path.exists(pth1):
data = pd.read_csv(pth1, sep=',')
elif os.path.exists(pth2):
data = pd.read_csv(pth2, sep=',')
else:
print('Path not found')
#
print('------------- First 5 lines ------------')
display(data.sample(5))
print('------------- Data types ------------')
display(data.info())
print('------------- Gaps ------------')
for element in data.columns:
if data[element].isna().any().mean() > 0:
print(element, '-', data[element].isna().sum())
else:
print(element, '- None')
print('------------- Duplicates ------------')
if data.duplicated().sum() > 0:
print(data.duplicated().sum())
else:
print('No Duplicates');
------------- First 5 lines ------------
| name | category | address | district | hours | lat | lng | rating | price | avg_bill | middle_avg_bill | middle_coffee_cup | chain | seats | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2700 | Додо Пицца | пиццерия | Москва, Уральская улица, 5А | Восточный административный округ | ежедневно, 10:00–22:30 | 55.813749 | 37.797604 | 4.3 | NaN | Средний счёт:422 ₽ | 422.0 | NaN | 1 | 24.0 |
| 4336 | Encore Cafe | ресторан | Москва, Гороховский переулок, 12, стр. 5 | Центральный административный округ | ежедневно, 09:00–23:00 | 55.764668 | 37.667043 | 4.4 | NaN | NaN | NaN | NaN | 0 | NaN |
| 6950 | Грузинская кухня | кафе | Москва, Херсонская улица, 20, корп. 1 | Юго-Западный административный округ | NaN | 55.652810 | 37.564495 | 4.4 | NaN | NaN | NaN | NaN | 1 | 50.0 |
| 4512 | Экспедиция. Северная кухня | ресторан | Москва, Певческий переулок, 6 | Центральный административный округ | ежедневно, 12:00–00:00 | 55.751718 | 37.641690 | 4.8 | NaN | NaN | NaN | NaN | 0 | 85.0 |
| 1543 | Бургер Кинг | ресторан | Москва, Ленинградское шоссе, 16А, стр. 4 | Северный административный округ | ежедневно, 10:00–22:50 | 55.823380 | 37.496938 | 4.3 | низкие | Средний счёт:300 ₽ | 300.0 | NaN | 1 | 230.0 |
------------- Data types ------------ <class 'pandas.core.frame.DataFrame'> RangeIndex: 8406 entries, 0 to 8405 Data columns (total 14 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 name 8406 non-null object 1 category 8406 non-null object 2 address 8406 non-null object 3 district 8406 non-null object 4 hours 7870 non-null object 5 lat 8406 non-null float64 6 lng 8406 non-null float64 7 rating 8406 non-null float64 8 price 3315 non-null object 9 avg_bill 3816 non-null object 10 middle_avg_bill 3149 non-null float64 11 middle_coffee_cup 535 non-null float64 12 chain 8406 non-null int64 13 seats 4795 non-null float64 dtypes: float64(6), int64(1), object(7) memory usage: 919.5+ KB
None
------------- Gaps ------------ name - None category - None address - None district - None hours - 536 lat - None lng - None rating - None price - 5091 avg_bill - 4590 middle_avg_bill - 5257 middle_coffee_cup - 7871 chain - None seats - 3611 ------------- Duplicates ------------ No Duplicates
There are 8406 establishments in operation. No duplicates were found. There are gaps in secondary values - at this stage of the work we do not fill them in. It may not be necessary in the future.
data['street'] = data['address'].str.split(', ').apply(lambda x: x[1])
def check_247(value):
try:
if ('круглосуточно' in value) and ('ежедневно' in value):
return True
return False
except: return False
data['is_24/7'] = data['hours'].apply(check_247)
# Let's check the correctness of the new columns
data[['street','is_24/7']].describe()
| street | is_24/7 | |
|---|---|---|
| count | 8406 | 8406 |
| unique | 1448 | 2 |
| top | проспект Мира | False |
| freq | 184 | 7676 |
# Replace the string values of the name column with lowercase
data['name'] = data['name'].str.lower()
data['name'] = data['name'].replace('\.',' ', regex=True)
data = data.rename(columns={'event.time':'event_time', 'event.name':'event_name', 'user.id':'user_id'})
data = data.replace({
'category': {
'бар,паб': 'bar,pub',
'булочная':'bakery',
'быстрое питание':'fast food',
'кафе':'cafe',
'кофейня':'coffee_shop',
'пиццерия':'pizzeria',
'ресторан':'restaurant',
'столовая':'canteen',
}
})
b = data.pivot_table(index='name', columns = 'category', values = 'lat', aggfunc = 'count')
b = b.fillna(0.0)
b['category_major'] = b.apply(lambda row: row.idxmax(), axis=1)
b = b.reset_index()
display(b.sample(5))
| category | name | bakery | bar,pub | cafe | canteen | coffee_shop | fast food | pizzeria | restaurant | category_major |
|---|---|---|---|---|---|---|---|---|---|---|
| 1318 | romashoff | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | coffee_shop |
| 2223 | вкусняша | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | restaurant |
| 2789 | кап кап | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | coffee_shop |
| 2928 | кафе соляночка | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | cafe |
| 2534 | домовёнок кузя | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | cafe |
display(b[b['name'] =='кафе'])
| category | name | bakery | bar,pub | cafe | canteen | coffee_shop | fast food | pizzeria | restaurant | category_major |
|---|---|---|---|---|---|---|---|---|---|---|
| 2825 | кафе | 0.0 | 2.0 | 159.0 | 6.0 | 6.0 | 7.0 | 1.0 | 8.0 | cafe |
# set the same categories for each chain of establishments
b = b[['name','category_major']]
data = data.merge(b, on ='name', how='left')
# Заменим названия районов на аббревиатуры
distr_dict = {'Северный административный округ':'САО',
'Северо-Восточный административный округ':'СВАО',
'Северо-Западный административный округ':'СЗАО',
'Западный административный округ':'ЗАО',
'Центральный административный округ':'ЦАО',
'Восточный административный округ':'ВАО',
'Юго-Восточный административный округ':'ЮВАО',
'Южный административный округ':'ЮАО',
'Юго-Западный административный округ':'ЮЗАО'}
data['district_abb'] = data['district'].apply(lambda x: distr_dict[x] if x in distr_dict.keys() else "ERROR")
display(data.sample(5))
| name | category | address | district | hours | lat | lng | rating | price | avg_bill | middle_avg_bill | middle_coffee_cup | chain | seats | street | is_24/7 | category_major | district_abb | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3859 | prime | restaurant | Москва, улица Сергея Макеева, 13 | Центральный административный округ | пн-пт 08:00–19:00 | 55.763369 | 37.551502 | 4.1 | средние | Средний счёт:300 ₽ | 300.0 | NaN | 1 | 125.0 | улица Сергея Макеева | False | restaurant | ЦАО |
| 3502 | рецептор | cafe | Москва, Большой Козихинский переулок, 10 | Центральный административный округ | ежедневно, 10:00–00:00 | 55.762990 | 37.597081 | 4.4 | выше среднего | Средний счёт:1000–1500 ₽ | 1250.0 | NaN | 0 | 80.0 | Большой Козихинский переулок | False | cafe | ЦАО |
| 3283 | mucho | restaurant | Москва, Береговой проезд, 5А, корп. 2 | Западный административный округ | ежедневно, 12:00–00:00 | 55.755495 | 37.509245 | 4.1 | выше среднего | Средний счёт:1000–1500 ₽ | 1250.0 | NaN | 0 | NaN | Береговой проезд | False | restaurant | ЗАО |
| 6080 | fibo pasta & ravioli | fast food | Москва, улица Вавилова, 64/1с1 | Юго-Западный административный округ | пн-чт 10:00–22:00; пт,сб 10:00–23:00; вс 10:00... | 55.684124 | 37.550262 | 4.5 | средние | Средний счёт:300–1000 ₽ | 650.0 | NaN | 1 | NaN | улица Вавилова | False | cafe | ЮЗАО |
| 3529 | колобок | canteen | Москва, Баррикадная улица, 8, стр. 5А | Центральный административный округ | пн-пт 07:00–18:00 | 55.761943 | 37.582159 | 4.2 | средние | Средний счёт:280–450 ₽ | 365.0 | NaN | 0 | NaN | Баррикадная улица | False | canteen | ЦАО |
There are 8406 establishments in operation. No duplicates were found. There are gaps in secondary values - at this stage of the work we do not fill them in. It may not be necessary in the future.
The gaps in the hours - 536 values make 6%. In the conditions of this experiment, this amount can be ignored. Working hours are not a critical characteristic in the study. The gaps in each of the price, avg_bill, middle_avg_bill, middle_coffee_cup columns are more than 60%, such losses are irreplaceable, and also these data are not critical in the study, since we were not tasked with determining the exact cost of a cup of coffee. And to define the general concept of the price category, this data is enough. The gaps in the seats column account for more than a third - these data are also not the subject of detailed study in this study.
The names are reduced to lowercase to reduce non-obvious duplicates.
px.defaults.template = "simple_white"
cat = data['category'].value_counts().reset_index()
fig = px.pie(cat, values = cat['category'], names=cat['index'],
title='Establishments category distribution', color_discrete_sequence= px.colors.qualitative.Vivid)
fig
order=data.groupby('category').mean().sort_values('seats', ascending=False).index
order
Index(['bar,pub', 'restaurant', 'coffee_shop', 'canteen', 'fast food', 'cafe',
'pizzeria', 'bakery'],
dtype='object', name='category')
sns.set_style("ticks")
sns.set_palette('Paired')
plt.figure(figsize=(16,6))
order=data.groupby('category').mean().sort_values('seats', ascending=False).index
fig = sns.barplot(x='category', y='seats', data=data, order=order, ci=None)
fig.set_title('Number of seats in establishments by category')
fig.set_xlabel('Category')
fig.set_ylabel('Number of seats');
data = data.replace({'chain':{0:'non-chain',1:'chain'}}, regex=True)
ch = data['chain'].value_counts().reset_index()
fig = px.pie(ch, values = ch['chain'], names = ch['index'],
title='Chain and non-chain establishments ratio', color_discrete_sequence= px.colors.qualitative.Vivid)
fig
d = data.pivot_table(index='category',columns='chain', values='lat', aggfunc='count').reset_index()
d['sum'] = d['chain']+d['non-chain']
d['chain_ratio'] = d['chain']/d['sum']
d = d.sort_values('chain_ratio', ascending=False)
d['chain_ratio'] = d['chain_ratio'].astype(str)
fig = px.histogram(d, x='category', y='chain_ratio',
color_discrete_sequence= px.colors.qualitative.Vivid)
fig.update_layout(title='Categories of chain establishments',
xaxis_title='Categories',
yaxis_title='Ratio of chain establishments')
fig.show()
Making a list of the top 15 establishments
d1 = data[data['chain']=='chain']['name'].value_counts().reset_index()
chain15 = d1.loc[:14,'index']
display(d1.head(15))
| index | name | |
|---|---|---|
| 0 | шоколадница | 120 |
| 1 | домино'с пицца | 76 |
| 2 | додо пицца | 74 |
| 3 | яндекс лавка | 72 |
| 4 | one price coffee | 71 |
| 5 | cofix | 65 |
| 6 | prime | 50 |
| 7 | хинкальная | 44 |
| 8 | кофепорт | 42 |
| 9 | кулинарная лавка братьев караваевых | 39 |
| 10 | теремок | 38 |
| 11 | чайхана | 37 |
| 12 | cofefest | 32 |
| 13 | буханка | 32 |
| 14 | му-му | 27 |
We build the corresponding histogram of the most popular establishments
d2 = data.query('chain =="chain" & name in @chain15')
fig = px.histogram(d2, x='name', color='category',
color_discrete_sequence= px.colors.qualitative.Vivid)
fig.update_layout(xaxis={'categoryorder':'total descending'})
fig.update_layout(title='Top-15 popular chain in Moscow',
yaxis_title='Number of establishments',
xaxis_title='Name of establishments')
fig.show()
d5 = d2.groupby(['district_abb','category'])['lat'].count().reset_index()
fig = px.bar(d5, x='district_abb', y= 'lat', color='category',
color_discrete_sequence= px.colors.qualitative.Vivid)
fig.update_layout(xaxis={'categoryorder':'total descending'})
fig.update_layout(title='Distribution of the top 15 chain establishments by districts of Moscow',
yaxis_title='Number of establishments',
xaxis_title='Administrative district of Moscow')
fig.show()
display(d2.groupby('name').agg({'category':'first',
'rating':'median',
'price':'first',
'middle_avg_bill':'median',
'middle_coffee_cup':'median',
'lat':'count'}).reset_index().sort_values('lat', ascending=False))
| name | category | rating | price | middle_avg_bill | middle_coffee_cup | lat | |
|---|---|---|---|---|---|---|---|
| 13 | шоколадница | coffee_shop | 4.20 | средние | 650.0 | 256.0 | 120 |
| 6 | домино'с пицца | pizzeria | 4.20 | средние | 500.0 | NaN | 76 |
| 5 | додо пицца | pizzeria | 4.30 | средние | 391.5 | NaN | 74 |
| 14 | яндекс лавка | restaurant | 4.00 | None | NaN | NaN | 72 |
| 2 | one price coffee | coffee_shop | 4.20 | средние | NaN | 80.0 | 71 |
| 1 | cofix | coffee_shop | 4.10 | средние | NaN | 60.0 | 65 |
| 3 | prime | restaurant | 4.20 | низкие | 300.0 | NaN | 50 |
| 11 | хинкальная | fast food | 4.40 | средние | 1000.0 | NaN | 44 |
| 7 | кофепорт | coffee_shop | 4.20 | низкие | NaN | 95.0 | 42 |
| 8 | кулинарная лавка братьев караваевых | cafe | 4.40 | средние | 450.0 | NaN | 39 |
| 10 | теремок | restaurant | 4.10 | средние | 325.0 | NaN | 38 |
| 12 | чайхана | cafe | 4.10 | средние | 400.0 | NaN | 37 |
| 0 | cofefest | coffee_shop | 4.05 | средние | 512.5 | 95.0 | 32 |
| 4 | буханка | bakery | 4.40 | средние | 237.5 | NaN | 32 |
| 9 | му-му | cafe | 4.30 | средние | 450.0 | NaN | 27 |
display('Median rating of F establishments' ,d2['rating'].median())
display('Median average check of chain establishments' ,d2['middle_avg_bill'].median())
'Median rating of F establishments'
4.2
'Median average check of chain establishments'
415.0
Establishments, in addition to their popularity, are united by the following factors:
Among the Top 15 chain establishments in Moscow, the first place is the coffee shop Chocolate, pizzeria, Yandex-Shop restaurant, and budget coffee shops
display(f'Total number of establishments - {len(data)}')
'Total number of establishments - 8406'
distr = data.pivot_table(index='category', columns='district_abb', values='name', aggfunc='count')
plt.figure(figsize=(15,15))
sns.set(font_scale=2)
fig = sns.heatmap(distr, annot=True, fmt='g', linewidth=.5, annot_kws={"fontsize":20})
sns.set(font_scale=1)
plt.xticks(rotation=70)
fig.set(xlabel=None, ylabel=None)
fig.set_title('Heat map of categories by districts of Moscow');
distr2 = data.groupby(['district_abb','category'])['lat'].count().reset_index()
fig = px.bar(distr2, x='district_abb', y='lat', color='category',
color_discrete_sequence= px.colors.qualitative.Vivid)
fig.update_layout(xaxis={'categoryorder':'total descending'})
fig.update_layout(title='Distribution of the top 15 chain establishments by districts of Moscow',
yaxis_title='Number of establishments',
xaxis_title='Administrative district of Moscow')
fig.show()
sns.set_style("ticks")
sns.set_palette('Paired')
plt.figure(figsize=(16,6))
order=data.groupby('category').mean().sort_values('rating', ascending=False).index
fig = sns.barplot(x='category', y='rating', data=data, order=order, ci=None)
fig.set_ylim(4,4.5)
fig.set_title('Average ratings by category of establishments')
fig.set_xlabel('Category')
fig.set_ylabel('Rating');
#
moscow_lat, moscow_lng = 55.751244, 37.618423
pth3 = 'admin_level_geomap.geojson'
pth4 = 'datasets/admin_level_geomap.geojson'
if os.path.exists(pth3):
with open(pth3, 'r', encoding='utf-8') as f:
state_geo = json.load(f)
elif os.path.exists(pth4):
with open(pth4, 'r', encoding='utf-8') as f:
state_geo = json.load(f)
else:
state_geo = 'admin_level_geomap.geojson'
rating = data.groupby('district', as_index=False)['rating'].agg('mean')
m = Map(location=[moscow_lat, moscow_lng], zoom_start=10)
Choropleth(
geo_data=state_geo,
data=rating,
columns=['district', 'rating'],
key_on='feature.name',
fill_color='YlOrRd',
fill_opacity=0.5,
legend_name='Average rating of establishments by district').add_to(m)
m
#
m = Map(location=[moscow_lat, moscow_lng], zoom_start=10)
marker_cluster = MarkerCluster().add_to(m)
def create_clusters(row):
icon_url = 'https://img.icons8.com/parakeet/344/restaurant-building.png'
icon = CustomIcon(icon_url, icon_size=(30,30))
Marker(
[row['lat'], row['lng']],
popup=f"{row['name']} {row['rating']}",
icon=icon
).add_to(marker_cluster)
data.apply(create_clusters, axis=1)
m
street_top = data['street'].value_counts().reset_index()
street_top = street_top.loc[:14,'index']
street_top
0 проспект Мира 1 Профсоюзная улица 2 проспект Вернадского 3 Ленинский проспект 4 Ленинградский проспект 5 Дмитровское шоссе 6 Каширское шоссе 7 Варшавское шоссе 8 Ленинградское шоссе 9 МКАД 10 Люблинская улица 11 улица Вавилова 12 Кутузовский проспект 13 улица Миклухо-Маклая 14 Пятницкая улица Name: index, dtype: object
d3 = data.query('street in @street_top')
fig = px.histogram(d3, x='street', color='category',
color_discrete_sequence= px.colors.qualitative.Vivid)
fig.update_layout(xaxis={'categoryorder':'total descending'})
fig.update_layout(title='Top 15 streets of Moscow',
xaxis_title='Street names',
yaxis_title='Number of establishments')
fig.show()
street_dull = data['street'].value_counts().reset_index()
street_dull = street_dull.query('street == 1')['index']
d4 = data.query('street in @street_dull')
display(d4.groupby(['district','price']).agg({'category':'first',
'rating':'median',
'middle_avg_bill':'median',
'lat':'count'}).reset_index().sort_values(['district', 'lat'], ascending=False))
| district | price | category | rating | middle_avg_bill | lat | |
|---|---|---|---|---|---|---|
| 28 | Южный административный округ | средние | coffee_shop | 4.20 | 300.0 | 10 |
| 26 | Южный административный округ | выше среднего | cafe | 4.35 | 1200.0 | 2 |
| 27 | Южный административный округ | низкие | canteen | 3.90 | 162.5 | 2 |
| 25 | Южный административный округ | высокие | restaurant | 4.10 | 3000.0 | 1 |
| 24 | Юго-Западный административный округ | средние | fast food | 4.00 | 562.5 | 2 |
| 23 | Юго-Западный административный округ | низкие | cafe | 3.70 | 100.0 | 1 |
| 22 | Юго-Восточный административный округ | средние | cafe | 4.30 | 512.5 | 9 |
| 21 | Юго-Восточный административный округ | выше среднего | bar,pub | 4.30 | 1500.0 | 3 |
| 20 | Юго-Восточный административный округ | высокие | fast food | 4.40 | 2750.0 | 1 |
| 19 | Центральный административный округ | средние | coffee_shop | 4.30 | 462.5 | 41 |
| 16 | Центральный административный округ | высокие | bar,pub | 4.50 | 2250.0 | 16 |
| 17 | Центральный административный округ | выше среднего | restaurant | 4.40 | 1300.0 | 13 |
| 18 | Центральный административный округ | низкие | restaurant | 4.20 | 350.0 | 1 |
| 14 | Северо-Западный административный округ | выше среднего | restaurant | 4.40 | 1250.0 | 3 |
| 15 | Северо-Западный административный округ | средние | restaurant | 4.30 | 1210.0 | 3 |
| 13 | Северо-Восточный административный округ | средние | coffee_shop | 4.20 | 375.0 | 15 |
| 11 | Северо-Восточный административный округ | выше среднего | restaurant | 4.35 | 1400.0 | 2 |
| 12 | Северо-Восточный административный округ | низкие | coffee_shop | 4.20 | 175.0 | 2 |
| 10 | Северо-Восточный административный округ | высокие | restaurant | 4.30 | 3250.0 | 1 |
| 9 | Северный административный округ | средние | canteen | 4.30 | 412.5 | 13 |
| 8 | Северный административный округ | низкие | coffee_shop | 3.95 | 207.5 | 4 |
| 7 | Северный административный округ | выше среднего | bar,pub | 4.30 | 1500.0 | 3 |
| 6 | Северный административный округ | высокие | restaurant | 4.30 | 1750.0 | 2 |
| 5 | Западный административный округ | средние | bar,pub | 4.20 | 600.0 | 11 |
| 3 | Западный административный округ | выше среднего | cafe | 4.25 | 1250.0 | 2 |
| 2 | Западный административный округ | высокие | bar,pub | 4.10 | 1750.0 | 1 |
| 4 | Западный административный округ | низкие | cafe | 4.00 | 150.0 | 1 |
| 1 | Восточный административный округ | средние | bar,pub | 4.30 | 600.0 | 16 |
| 0 | Восточный административный округ | выше среднего | cafe | 4.10 | 1150.0 | 1 |
fig = px.histogram(d4, x='district_abb', color='category',
color_discrete_sequence= px.colors.qualitative.Vivid)
fig.update_layout(xaxis={'categoryorder':'total descending'})
fig.update_layout(title='Distribution of streets with a single institution by districts and categories',
xaxis_title='Administrative district of Moscow',
yaxis_title='Number of establishments')
fig.show()
display(f'Number of streets with a single catering facility - {len(street_dull)}')
display('Median rating of individual establishments' ,d4['rating'].median())
display('Median average check of individual establishments' ,d4['middle_avg_bill'].median())
'Number of streets with a single catering facility - 458'
'Median rating of individual establishments'
4.3
'Median average check of individual establishments'
625.0
Total streets with one institution in Moscow - 458 It is typical for such establishments:
#
moscow_lat, moscow_lng = 55.751244, 37.618423
pth3 = 'admin_level_geomap.geojson'
pth4 = 'datasets/admin_level_geomap.geojson'
if os.path.exists(pth3):
with open(pth3, 'r', encoding='utf-8') as f:
state_geo = json.load(f)
elif os.path.exists(pth4):
with open(pth4, 'r', encoding='utf-8') as f:
state_geo = json.load(f)
else:
state_geo = 'datasets/admin_level_geomap.geojson'
distr_bill = data.groupby('district', as_index=False)['middle_avg_bill'].agg('median')
m = Map(location=[moscow_lat, moscow_lng], zoom_start=10)
Choropleth(
geo_data=state_geo,
data=distr_bill,
columns=['district', 'middle_avg_bill'],
key_on='feature.name',
fill_color='YlOrRd',
fill_opacity=0.5,
legend_name='Median price of establishments by district').add_to(m)
m
The distance from the center does not affect the average check.
The average check is determined by the status of the district. In more expensive/elite/historical areas, the prices are higher - the Eastern AD is on a par with the Centeal AD here
#
is24_distr = data.groupby('district', as_index=False)['is_24/7'].agg('mean')
moscow_lat, moscow_lng = 55.751244, 37.618423
pth3 = 'admin_level_geomap.geojson'
pth4 = 'datasets/admin_level_geomap.geojson'
if os.path.exists(pth3):
with open(pth3, 'r', encoding='utf-8') as f:
state_geo = json.load(f)
elif os.path.exists(pth4):
with open(pth4, 'r', encoding='utf-8') as f:
state_geo = json.load(f)
else:
state_geo = 'datasets/admin_level_geomap.geojson'
is24_distr = data.groupby('district', as_index=False)['is_24/7'].agg('mean')
m = Map(location=[moscow_lat, moscow_lng], zoom_start=10)
Choropleth(
geo_data=state_geo,
data=is24_distr,
columns=['district', 'is_24/7'],
key_on='feature.name',
fill_color='YlOrRd',
fill_opacity=0.5,
legend_name='Opening hours of establishments by district').add_to(m)
m
The map of round-the-clock establishments mirrors the map of prices and ratings.
The more disadvantaged the area, the more round-the-clock establishments - Eastern Administrative District, SEAD
The smallest part of round-the-clock establishments in the Central AD
# Let's determine the value of the "bad rating"
data['rating'].describe()
count 8406.000000 mean 4.229895 std 0.470348 min 1.000000 25% 4.100000 50% 4.300000 75% 4.400000 max 5.000000 Name: rating, dtype: float64
data['rating_rate'] = data['rating'].apply(lambda x: 'low' if x<=4 else 'high')
fig = px.histogram(data[data['rating_rate'] == 'low'], y='district_abb', color='category',
color_discrete_sequence= px.colors.qualitative.Vivid)
fig.update_layout(yaxis={'categoryorder':'total descending'})
fig.update_layout(title='Distribution of establishments with poor ratings by districts and categories',
xaxis_title='Number of establishments',
yaxis_title='Administrative district of Moscow ')
fig.show()
d5 = data.groupby(['category','rating_rate'])['middle_avg_bill'].median().reset_index()
fig = px.bar(d5, x='middle_avg_bill', y='category', color='rating_rate',
color_discrete_sequence= px.colors.qualitative.Vivid, barmode='stack')
fig.update_layout(yaxis={'categoryorder':'total descending'})
fig.update_layout(title='Average check of establishments with poor ratings by category',
xaxis_title='Median average check',
yaxis_title='Institution category ')
fig.show()
coffee = data[data['category']=='coffee_shop']
display(f'Number of coffee shops in the dataset', coffee['name'].count())
'Number of coffee shops in the dataset'
1413
#
moscow_lat, moscow_lng = 55.751244, 37.618423
pth3 = 'admin_level_geomap.geojson'
pth4 = 'datasets/admin_level_geomap.geojson'
if os.path.exists(pth3):
with open(pth3, 'r', encoding='utf-8') as f:
state_geo = json.load(f)
elif os.path.exists(pth4):
with open(pth4, 'r', encoding='utf-8') as f:
state_geo = json.load(f)
else:
state_geo = 'datasets/admin_level_geomap.geojson'
coffee_distr = coffee.groupby('district', as_index=False)['name'].agg('count')
m = Map(location=[moscow_lat, moscow_lng], zoom_start=10)
marker_cluster = MarkerCluster().add_to(m)
def create_clusters(row):
icon_url = 'https://img.icons8.com/parakeet/344/restaurant-building.png'
icon = CustomIcon(icon_url, icon_size=(30,30))
Marker(
[row['lat'], row['lng']],
popup=f"{row['name']} {row['rating']}",
icon=icon
).add_to(marker_cluster)
coffee.apply(create_clusters, axis=1)
Choropleth(
geo_data=state_geo,
data=coffee_distr,
columns=['district', 'name'],
key_on='feature.name',
fill_color='YlOrRd',
fill_opacity=0.5,
legend_name='Number of coffee shops in the dataset').add_to(m)
m
display('Number of 24-hour coffee shops in Moscow:', len(coffee[coffee['is_24/7'] == True]))
'Number of 24-hour coffee shops in Moscow:'
59
plt.figure(figsize=(16,6))
order=coffee.groupby('district_abb').mean().sort_values('rating', ascending=False).index
fig = sns.barplot(x='district_abb', y='rating', data=coffee, order=order, ci=None)
fig.set_ylim(4,4.5)
fig.set_title('Average ratings by category of establishments')
fig.set_xlabel('Administrative District of Moscow')
fig.set_ylabel('Rating of establishments');
#
moscow_lat, moscow_lng = 55.751244, 37.618423
pth3 = 'admin_level_geomap.geojson'
pth4 = 'datasets/admin_level_geomap.geojson'
if os.path.exists(pth3):
with open(pth3, 'r', encoding='utf-8') as f:
state_geo = json.load(f)
elif os.path.exists(pth4):
with open(pth4, 'r', encoding='utf-8') as f:
state_geo = json.load(f)
else:
state_geo = 'datasets/admin_level_geomap.geojson'
coffee_rat = coffee.groupby('district', as_index=False)['rating'].agg('mean')
m = Map(location=[moscow_lat, moscow_lng], zoom_start=10)
Choropleth(
geo_data=state_geo,
data=coffee_rat,
columns=['district', 'rating'],
key_on='feature.name',
fill_color='YlOrRd',
fill_opacity=0.5,
legend_name='Средний рейтинг заведений по районам').add_to(m)
m
plt.figure(figsize=(16,6))
order=coffee.groupby('district_abb').mean().sort_values('middle_coffee_cup', ascending=False).index
fig = sns.barplot(x='district_abb', y='middle_coffee_cup', data=coffee, order=order, ci=None)
fig.set_ylim(130,200)
fig.set_title('Average cost of a cup of cappuccino by district')
fig.set_xlabel('Administrative District of Moscow')
fig.set_ylabel('Average cost of a cup');
The number of coffee shops in the dataset is 1380
Most coffee shops in CAD - 425, NAD - 190, NEAD - 148, WAD - 142
The number of round-the-clock coffee shops in Moscow is 53
The best rating of coffee shops in CAD, NWAD, the lowest - WAD, SAD, NEAD
Consider for a coffee shop the areas with the lowest ratings and the most sparsely populated coffee shops, which means with the most uncompetitive environment
**NAD NEAD SAD
The average cost of a cup of cappuccino in these areas:
WAD - 195₽
NEAD - 165₽
SAD - 160₽
Consider coffee shops in the areas selected above.
We will look for the street with the least number of competitors, where there is only one institution on the street
coffee_adv1 = coffee.query('district_abb in ("ЗАО", "ЮАО", "СВАО") & street in @street_dull')
display(f'The number of streets in the WAD, SAD, NEAD with a single institution - {len(coffee_adv1)}')
'The number of streets in the WAD, SAD, NEAD with a single institution - 17'
#
moscow_lat, moscow_lng = 55.751244, 37.618423
m = Map(location=[moscow_lat, moscow_lng], zoom_start=10)
marker_cluster = MarkerCluster().add_to(m)
def create_clusters(row):
icon_url = 'https://img.icons8.com/parakeet/344/restaurant-building.png'
icon = CustomIcon(icon_url, icon_size=(30,30))
Marker(
[row['lat'], row['lng']],
popup=f"{row['name']} {row['rating']} {row['street']}",
icon=icon
).add_to(marker_cluster)
coffee_adv1.apply(create_clusters, axis=1)
m
Based on the small number of such streets - 17, it will be easy to choose the most passable ones with the help of third-party data
Consider the 200 most popular streets in Moscow where there are no coffee shops
street_pop = data['street'].value_counts().reset_index()
# We select 200 popular streets of Moscow
street_pop = street_pop.loc[:200,'index']
coffee_no = data.query('street in @street_pop').pivot_table(index='street',
columns = 'category',
values = 'name',
aggfunc = 'count').reset_index().sort_values('coffee_shop')
# We select from 200 those that do not have coffee shops
coffee_no = coffee_no.query('coffee_shop != coffee_shop')['street']
coffee_adv2 = data.query('street in @coffee_no')
display(f'The number of popular streets in Moscow without coffee shops', len(coffee_adv2.groupby('street')))
'The number of popular streets in Moscow without coffee shops'
17
#
moscow_lat, moscow_lng = 55.751244, 37.618423
m = Map(location=[moscow_lat, moscow_lng], zoom_start=10)
marker_cluster = MarkerCluster().add_to(m)
def create_clusters(row):
icon_url = 'https://img.icons8.com/parakeet/344/restaurant-building.png'
icon = CustomIcon(icon_url, icon_size=(30,30))
Marker(
[row['lat'], row['lng']],
popup=f"{row['name']} {row['rating']} {row['street']}",
icon=icon
).add_to(marker_cluster)
coffee_adv2.apply(create_clusters, axis=1)
m
17 streets is also a small number, it will be easy to choose from it.
You can additionally impose restrictions on the above selected areas.
Or, depending on the business plan, focus on areas for the price of a cup.
Presentation: https://disk.yandex.ru/i/kGR5OggR9x2a4g